Description: In the field of agriculture finding and taking right precautions for different diseases is very important, The aim of this project is to solve this problem by building a model which can predict the disease of the plant given an input of the image

Steps to follow:

  1. Unzip the data
  2. Keep similar kind of images in a directory
  3. Build the conv model
  4. Evaluate and test the model

2. Creating directories to keep each kind of images in it's own directories

steps To be followed

  1. Get the directory of images
  2. Load the Train and test csv files
  3. Visualize few of the images to get one with data
  4. sent each image to its own directory namely Healthy, Multiple diseases, Rust, Scab

Make a new column in our train dataframe Labels which does the following:

  1. If the row contains healthy=1 make the value in label column 0.
  2. If the row contains multiple-dieseases=1 make the value in label column 1.
  3. If the row contains rust=1 make the value in label column 2.
  4. If the row contains scab=1 make the value in label column 3.

Steps to follow to prepare the data

  1. Loop through all the images one by one after sorting the images
  2. Find the label of image and send them to their prescribed directory
  3. Inorder to send images to prescribed directories, firstly we should create directories
  4. Create the directories and call the Create_train_data() function
  5. Huh huh! The data is prepared

Now create folders using the train_set and test_set and send images in the Train and Test directories to their respective locations!

Getting one with data

  1. Get the class names programatically
  2. Analyze the number of images in each directory
  3. Radomly visualize few of the images in each category

Building the model

  1. create test and valid data batches
  2. Create a Data augmentation layer
  3. Try Efficient model using Transfer learning
  4. Fine Tune the model by unfreezing some of the layers
  5. check how good is fine tuning working
  6. evaluate the model with test images

now we can clearly see that both the training and validation loss are decreasing and both training and validation accuracy are increasing meaning that the model on adding more epochs wil increase the accuracy without over fitting

Now let's fine tune the model

  1. Step1: Unfreeze all the layers
  2. step2: Then Freeze all the layers except the last 5 layers
  3. step3: Increase the learning-rate
  4. step4: Recompile the model

we can see that fine tuning did not went that well the reasons may be because of less data we are having so it is better to build model by freezing all the layers

checking where our model made the most wrong predictons

  1. Unravel the test_data so we can use their original labels to track how well the model is performing
  2. Plot confusion to find where the model is doing more erros
  3. Get the F1-scores and plot them in descending order 4.. Find the most wrong-predictions, that is the wrong predictions with highest prediction probability

In the above code The labels column in the test data is taken and converted into numpy array and it typically looks like->[1,0,0,0...0] just like one hot encoded

we can observe that the model is making most wrong predctions in the case of multiple diseases

  1. This may be due to wrong labelling of images
  2. Or one of the diseases showing more dominance over the other
  3. Now let's visualize the most wrong predicted images so we will get more detail understanding
  4. Before doing the above let's check various metrics and plot the f1-scores of each class from top to bottom

Inorder to Visualize F1-scores

  1. Create a DataFrame containing Class_names and F1-scores columns
  2. Inorder to create DataFrame as above we have to get classification_report as dictionary
  3. Loop through the dict and create a dictionary of key value pairs containing class_names and F1-scores excluding accuracy, macro avg, Weighted avg

Inorder to viusalize the most wrong prediction images we follow the following steps:

  1. Create a Data Frame containing the following columns
  1. Create a new columns pred_status which shows whether the prediction is true or false
  2. select all the rows in the dataframe having pred_status false and sort them in the descending order
  3. visualize the images

Here We can clearly see that the most wrong predictions or due to wrong labelling and model getting confused, so more number of samples are to be added inorder for the model to work well.

Hurrah! This the best model we have built so far